Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 60
Filtrar
1.
Sci Rep ; 14(1): 3946, 2024 02 16.
Artigo em Inglês | MEDLINE | ID: mdl-38365936

RESUMO

The advent of single-cell RNA sequencing (scRNA-seq) technology has revolutionized our ability to explore cellular diversity and unravel the complexities of intricate diseases. However, due to the inherently low signal-to-noise ratio and the presence of an excessive number of missing values, scRNA-seq data analysis encounters unique challenges. Here, we present cnnImpute, a novel convolutional neural network (CNN) based method designed to address the issue of missing data in scRNA-seq. Our approach starts by estimating missing probabilities, followed by constructing a CNN-based model to recover expression values with a high likelihood of being missing. Through comprehensive evaluations, cnnImpute demonstrates its effectiveness in accurately imputing missing values while preserving the integrity of cell clusters in scRNA-seq data analysis. It achieved superior performance in various benchmarking experiments. cnnImpute offers an accurate and scalable method for recovering missing values, providing a useful resource for scRNA-seq data analysis.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Sequenciamento do Exoma , Probabilidade , Análise por Conglomerados , RNA
2.
Int J Mol Sci ; 23(22)2022 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-36430822

RESUMO

Chronic myeloid leukemia (CML) is a myeloproliferative disease characterized by a unique BCR-ABL fusion gene. Tyrosine kinase inhibitors (TKIs) were developed to target the BCR-ABL oncoprotein, inhibiting its abnormal kinase activity. TKI treatments have significantly improved CML patient outcomes. However, the patients can develop drug resistance and relapse after therapy discontinues largely due to intratumor heterogeneity. It is critical to understand the differences in therapeutic responses among subpopulations of cells. Single-cell RNA sequencing measures the transcriptome of individual cells, allowing us to differentiate and analyze individual cell populations. Here, we integrated a single-cell RNA sequencing profile of CML stem cells and network analysis to decipher the mechanisms of distinct TKI responses. Compared to normal hematopoietic stem cells, a set of genes that were concordantly differentially expressed in various types of stem cells of CML patients was revealed. Further transcription regulatory network analysis found that most of these genes were directly controlled by one or more transcript factors and the genes have more regulators in the cells of the patients who responded to the treatment. The molecular markers including a known drug-resistance gene and novel gene signatures for treatment response were also identified. Moreover, we combined protein-protein interaction network construction with a cancer drug database and uncovered the drugs that target the marker genes directly or indirectly via the protein interactions. The gene signatures and their interacted proteins identified by this work can be used for treatment response prediction and lead to new strategies for drug resistance monitoring and prevention. Our single-cell-based findings offered novel insights into the mechanisms underlying the therapeutic response of CML.


Assuntos
Leucemia Mielogênica Crônica BCR-ABL Positiva , Transcriptoma , Humanos , Inibidores de Proteínas Quinases/farmacologia , Inibidores de Proteínas Quinases/uso terapêutico , Resistencia a Medicamentos Antineoplásicos/genética , Leucemia Mielogênica Crônica BCR-ABL Positiva/tratamento farmacológico , Leucemia Mielogênica Crônica BCR-ABL Positiva/genética , Leucemia Mielogênica Crônica BCR-ABL Positiva/patologia , Proteínas de Fusão bcr-abl
3.
Genes (Basel) ; 12(12)2021 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-34946850

RESUMO

Autism spectrum disorder (ASD) is a neurodevelopmental disorder that impedes patients' cognition, social, speech and communication skills. ASD is highly heterogeneous with a variety of etiologies and clinical manifestations. The prevalence rate of ASD increased steadily in recent years. Presently, molecular mechanisms underlying ASD occurrence and development remain to be elucidated. Here, we integrated multi-layer genomics data to investigate the transcriptome and pathway dysregulations in ASD development. The RNA sequencing (RNA-seq) expression profiles of induced pluripotent stem cells (iPSCs), neural progenitor cells (NPCs) and neuron cells from ASD and normal samples were compared in our study. We found that substantially more genes were differentially expressed in the NPCs than the iPSCs. Consistently, gene set variation analysis revealed that the activity of the known ASD pathways in NPCs and neural cells were significantly different from the iPSCs, suggesting that ASD occurred at the early stage of neural system development. We further constructed comprehensive brain- and neural-specific regulatory networks by incorporating transcription factor (TF) and gene interactions with long 5 non-coding RNA(lncRNA) and protein interactions. We then overlaid the transcriptomes of different cell types on the regulatory networks to infer the regulatory cascades. The variations of the regulatory cascades between ASD and normal samples uncovered a set of novel disease-associated genes and gene interactions, particularly highlighting the functional roles of ELF3 and the interaction between STAT1 and lncRNA ELF3-AS 1 in the disease development. These new findings extend our understanding of ASD and offer putative new therapeutic targets for further studies.


Assuntos
Transtorno do Espectro Autista/genética , Redes Reguladoras de Genes/genética , Neurônios/patologia , Transtorno do Espectro Autista/patologia , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica/genética , Humanos , Células-Tronco Pluripotentes Induzidas/patologia , Células-Tronco Neurais/patologia , Organogênese/genética , Análise de Sequência de RNA/métodos , Fatores de Transcrição/genética , Transcriptoma/genética
4.
PeerJ ; 9: e10549, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33665002

RESUMO

Alzheimer's disease (AD) is a progressive neurodegenerative disorder, accounting for nearly 60% of all dementia cases. The occurrence of the disease has been increasing rapidly in recent years. Presently about 46.8 million individuals suffer from AD worldwide. The current absence of effective treatment to reverse or stop AD progression highlights the importance of disease prevention and early diagnosis. Brain structural Magnetic Resonance Imaging (MRI) has been widely used for AD detection as it can display morphometric differences and cerebral structural changes. In this study, we built three machine learning-based MRI data classifiers to predict AD and infer the brain regions that contribute to disease development and progression. We then systematically compared the three distinct classifiers, which were constructed based on Support Vector Machine (SVM), 3D Very Deep Convolutional Network (VGGNet) and 3D Deep Residual Network (ResNet), respectively. To improve the performance of the deep learning classifiers, we applied a transfer learning strategy. The weights of a pre-trained model were transferred and adopted as the initial weights of our models. Transferring the learned features significantly reduced training time and increased network efficiency. The classification accuracy for AD subjects from elderly control subjects was 90%, 95%, and 95% for the SVM, VGGNet and ResNet classifiers, respectively. Gradient-weighted Class Activation Mapping (Grad-CAM) was employed to show discriminative regions that contributed most to the AD classification by utilizing the learned spatial information of the 3D-VGGNet and 3D-ResNet models. The resulted maps consistently highlighted several disease-associated brain regions, particularly the cerebellum which is a relatively neglected brain region in the present AD study. Overall, our comparisons suggested that the ResNet model provided the best classification performance as well as more accurate localization of disease-associated regions in the brain compared to the other two approaches.

5.
J Ambient Intell Humaniz Comput ; 10(5): 2029-2040, 2019 May.
Artigo em Inglês | MEDLINE | ID: mdl-31068980

RESUMO

With the massive volume and rapid increasing of data, feature space study is of great importance. To avoid the complex training processes in deep learning models which project original feature space into low-dimensional ones, we propose a novel feature space learning (FSL) model. The main contributions in our approach are: (1) FSL can not only select useful features but also adaptively update feature values and span new feature spaces; (2) four FSL algorithms are proposed with the feature space updating procedure; (3) FSL can provide a better data understanding and learn descriptive and compact feature spaces without the tough training for deep architectures. Experimental results on benchmark data sets demonstrate that FSL-based algorithms performed better than the classical unsupervised, semi-supervised learning and even incremental semi-supervised algorithms. In addition, we show a visualization of the learned feature space results. With the carefully designed learning strategy, FSL dynamically disentangles explanatory factors, depresses the noise accumulation and semantic shift, and constructs easy-to-understand feature spaces.

6.
BMC Syst Biol ; 13(1): 13, 2019 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-30670065

RESUMO

It was highlighted that the original article [1] contained a typesetting error in the last name of Allon Canaan. This was incorrectly captured as Allon Canaann in the original article which has since been updated.

7.
Neural Process Lett ; 50(1): 103-119, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-35035261

RESUMO

Automatically describing contents of an image using natural language has drawn much attention because it not only integrates computer vision and natural language processing but also has practical applications. Using an end-to-end approach, we propose a bidirectional semantic attention-based guiding of long short-term memory (Bag-LSTM) model for image captioning. The proposed model consciously refines image features from previously generated text. By fine-tuning the parameters of convolution neural networks, Bag-LSTM obtains more text-related image features via feedback propagation than other models. As opposed to existing guidance-LSTM methods which directly add image features into each unit of an LSTM block, our fine-tuned model dynamically leverages more text-conditional image features, acquired by the semantic attention mechanism, as guidance information. Moreover, we exploit bidirectional gLSTM as the caption generator, which is capable of learning long term relations between visual features and semantic information by making use of both historical and future contextual information. In addition, variations of the Bag-LSTM model are proposed in an effort to sufficiently describe high-level visual-language interactions. Experiments on the Flickr8k and MSCOCO benchmark datasets demonstrate the effectiveness of the model, as compared with the baseline algorithms, such as it is 51.2% higher than BRNN on CIDEr metric.

8.
BMC Syst Biol ; 12(Suppl 7): 114, 2018 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-30547798

RESUMO

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) technology provides an effective way to study cell heterogeneity. However, due to the low capture efficiency and stochastic gene expression, scRNA-seq data often contains a high percentage of missing values. It has been showed that the missing rate can reach approximately 30% even after noise reduction. To accurately recover missing values in scRNA-seq data, we need to know where the missing data is; how much data is missing; and what are the values of these data. METHODS: To solve these three problems, we propose a novel model with a hybrid machine learning method, namely, missing imputation for single-cell RNA-seq (MISC). To solve the first problem, we transformed it to a binary classification problem on the RNA-seq expression matrix. Then, for the second problem, we searched for the intersection of the classification results, zero-inflated model and false negative model results. Finally, we used the regression model to recover the data in the missing elements. RESULTS: We compared the raw data without imputation, the mean-smooth neighbor cell trajectory, MISC on chronic myeloid leukemia data (CML), the primary somatosensory cortex and the hippocampal CA1 region of mouse brain cells. On the CML data, MISC discovered a trajectory branch from the CP-CML to the BC-CML, which provides direct evidence of evolution from CP to BC stem cells. On the mouse brain data, MISC clearly divides the pyramidal CA1 into different branches, and it is direct evidence of pyramidal CA1 in the subpopulations. In the meantime, with MISC, the oligodendrocyte cells became an independent group with an apparent boundary. CONCLUSIONS: Our results showed that the MISC model improved the cell type classification and could be instrumental to study cellular heterogeneity. Overall, MISC is a robust missing data imputation model for single-cell RNA-seq data.


Assuntos
Análise de Sequência de RNA/métodos , Análise de Célula Única , Humanos , Leucemia Mielogênica Crônica BCR-ABL Positiva/genética , Leucemia Mielogênica Crônica BCR-ABL Positiva/patologia , Células-Tronco Neoplásicas/metabolismo , Células-Tronco Neoplásicas/patologia
9.
BMC Syst Biol ; 12(Suppl 7): 116, 2018 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-30547805

RESUMO

BACKGROUND: Nowadays, because of the huge economic burden on society causing by obesity and diabetes, they turn into the most serious public health challenges in the world. To reveal the close and complex relationships between diabetes, obesity and other diseases, search the effective treatment for them, a novel model named as representative latent Dirichlet allocation (RLDA) topic model is presented. RESULTS: RLDA was applied to a corpus of more than 337,000 literatures of diabetes and obesity which were published from 2007 to 2016. To unveil those meaningful relationships between diabetes mellitus, obesity and other diseases, we performed an explicit analysis on the output of our model with a series of visualization tools. Then, with the clinical reports which were not used in the training data to show the credibility of our discoveries, we find that a sufficient number of these records are matched directly. Our results illustrate that in the last 10 years, for obesity accompanying diseases, scientists and researchers mainly focus on 17 of them, such as asthma, gastric disease, heart disease and so on; for the study of diabetes mellitus, it features a more broad scope of 26 diseases, such as Alzheimer's disease, heart disease and so forth; for both of them, there are 15 accompanying diseases, listed as following: adrenal disease, anxiety, cardiovascular disease, depression, heart disease, hepatitis, hypertension, hypothalamic disease, respiratory disease, myocardial infarction, OSAS, liver disease, lung disease, schizophrenia, tuberculosis. In addition, tumor necrosis factor, tumor, adolescent obesity or diabetes, inflammation, hypertension and cell are going be the hot topics related to diabetes mellitus and obesity in the next few years. CONCLUSIONS: With the help of RLDA, the hotspots analysis-relation discovery results on diabetes and obesity were achieved. We extracted the significant relationships between them and other diseases such as Alzheimer's disease, heart disease and tumor. It is believed that the new proposed representation learning algorithm can help biomedical researchers better focus their attention and optimize their research direction.


Assuntos
Biologia Computacional/métodos , Complicações do Diabetes , Obesidade/complicações , Algoritmos , PubMed
10.
BMC Syst Biol ; 12(Suppl 7): 117, 2018 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-30547817

RESUMO

BACKGROUND: Adenocarcinoma in situ (AIS) is a pre-invasive lesion in the lung and a subtype of lung adenocarcinoma. The patients with AIS can be cured by resecting the lesion completely. In contrast, the patients with invasive lung adenocarcinoma have very poor 5-year survival rate. AIS can develop into invasive lung adenocarcinoma. The investigation and comparison of AIS and invasive lung adenocarcinoma at the genomic level can deepen our understanding of the mechanisms underlying lung cancer development. RESULTS: In this study, we identified 61 lung adenocarcinoma (LUAD) invasive-specific differentially expressed genes, including nine long non-coding RNAs (lncRNAs) based on RNA sequencing techniques (RNA-seq) data from normal, AIS, and invasive tissue samples. These genes displayed concordant differential expression (DE) patterns in the independent stage III LUAD tissues obtained from The Cancer Genome Atlas (TCGA) RNA-seq dataset. For individual invasive-specific genes, we constructed subnetworks using the Genetic Algorithm (GA) based on protein-protein interactions, protein-DNA interactions and lncRNA regulations. A total of 19 core subnetworks that consisted of invasive-specific genes and at least one putative lung cancer driver gene were identified by our study. Functional analysis of the core subnetworks revealed their enrichment in known pathways and biological progresses responsible for tumor growth and invasion, including the VEGF signaling pathway and the negative regulation of cell growth. CONCLUSIONS: Our comparison analysis of invasive cases, normal and AIS uncovered critical genes that involved in the LUAD invasion progression. Furthermore, the GA-based network method revealed gene clusters that may function in the pathways contributing to tumor invasion. The interactions between differentially expressed genes and putative driver genes identified through the network analysis can offer new targets for preventing the cancer invasion and potentially increase the survival rate for cancer patients.


Assuntos
Adenocarcinoma de Pulmão/genética , Adenocarcinoma de Pulmão/patologia , Biologia de Sistemas , Progressão da Doença , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Genômica , Humanos , Mutação , Invasividade Neoplásica
11.
BMC Syst Biol ; 12(Suppl 7): 91, 2018 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-30547845

RESUMO

BACKGROUND: Autism Spectrum Disorder (ASD) is the umbrella term for a group of neurodevelopmental disorders convergent on behavioral phenotypes. While many genes have been implicated in the disorder, the predominant focus of previous research has been on protein coding genes. This leaves a vast number of long non-coding RNAs (lncRNAs) not characterized for their role in the disorder although lncRNAs have been shown to play important roles in development and are highly represented in the brain. Studies have also shown lncRNAs to be differentially expressed in ASD affected brains. However, there has yet to be an enrichment analysis of the shared ontologies and pathways of known ASD genes and lncRNAs in normal brain development. RESULTS: In this study, we performed co-expression network analysis on the developing brain transcriptome to identify potential lncRNAs associated with ASD and possible annotations for functional role of lncRNAs in brain development. We found co-enrichment of lncRNA genes and ASD risk genes in two distinct groups of modules showing elevated prenatal and postnatal expression patterns, respectively. Further enrichment analysis of the module groups indicated that the early expression modules were comprised mainly of transcriptional regulators while the later expression modules were associated with synapse formation. Finally, lncRNAs were prioritized for their connectivity with the known ASD risk genes through analysis of an adjacency matrix. Collectively, the results imply early developmental repression of synaptic genes through lncRNAs and ASD transcriptional regulators. CONCLUSION: Here we demonstrate the utility of mining the publically available brain gene expression data to further functionally annotate the role of lncRNAs in ASD. Our analysis indicates that lncRNAs potentially have a key role in ASD due to their convergence on shared pathways, and we identify lncRNAs of interest that may lead to further avenues of study.


Assuntos
Transtorno Autístico/genética , Encéfalo/crescimento & desenvolvimento , Encéfalo/metabolismo , Regulação da Expressão Gênica , Predisposição Genética para Doença/genética , RNA Longo não Codificante/genética , Redes Reguladoras de Genes , Humanos , Sinapses/genética , Transcrição Gênica
12.
BMC Med Genomics ; 11(Suppl 5): 106, 2018 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-30453959

RESUMO

BACKGROUND: Non-small cell lung cancer (NSCLC) represents more than about 80% of the lung cancer. The early stages of NSCLC can be treated with complete resection with a good prognosis. However, most cases are detected at late stage of the disease. The average survival rate of the patients with invasive lung cancer is only about 4%. Adenocarcinoma in situ (AIS) is an intermediate subtype of lung adenocarcinoma that exhibits early stage growth patterns but can develop into invasion. METHODS: In this study, we used RNA-seq data from normal, AIS, and invasive lung cancer tissues to identify a gene module that represents the distinguishing characteristics of AIS as AIS-specific genes. Two differential expression analysis algorithms were employed to identify the AIS-specific genes. Then, the subset of the best performed AIS-specific genes for the early lung cancer prediction were selected by random forest. Finally, the performances of the early lung cancer prediction were assessed using random forest, support vector machine (SVM) and artificial neural networks (ANNs) on four independent early lung cancer datasets including one tumor-educated blood platelets (TEPs) dataset. RESULTS: Based on the differential expression analysis, 107 AIS-specific genes that consisted of 93 protein-coding genes and 14 long non-coding RNAs (lncRNAs) were identified. The significant functions associated with these genes include angiogenesis and ECM-receptor interaction, which are highly related to cancer development and contribute to the smoking-free lung cancers. Moreover, 12 of the AIS-specific lncRNAs are involved in lung cancer progression by potentially regulating the ECM-receptor interaction pathway. The feature selection by random forest identified 20 of the AIS-specific genes as early stage lung cancer signatures using the dataset obtained from The Cancer Genome Atlas (TCGA) lung adenocarcinoma samples. Of the 20 signatures, two were lncRNAs, BLACAT1 and CTD-2527I21.15 which have been reported to be associated with bladder cancer, colorectal cancer and breast cancer. In blind classification for three independent tissue sample datasets, these signature genes consistently yielded about 98% accuracy for distinguishing early stage lung cancer from normal cases. However, the prediction accuracy for the blood platelets samples was only 64.35% (sensitivity 78.1%, specificity 50.59%, and AUROC 0.747). CONCLUSIONS: The comparison of AIS with normal and invasive tumor revealed diseases-specific genes and offered new insights into the mechanism underlying AIS progression into an invasive tumor. These genes can also serve as the signatures for early diagnosis of lung cancer with high accuracy. The expression profile of gene signatures identified from tissue cancer samples yielded remarkable early cancer prediction for tissues samples, however, relatively lower accuracy for boold platelets samples.


Assuntos
Adenocarcinoma in Situ/patologia , Neoplasias Pulmonares/patologia , Adenocarcinoma in Situ/genética , Área Sob a Curva , Bases de Dados Genéticas , Progressão da Doença , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias Pulmonares/genética , Aprendizado de Máquina , Estadiamento de Neoplasias , Fases de Leitura Aberta/genética , RNA Longo não Codificante/genética , Curva ROC , Transcriptoma
13.
BMC Med Genomics ; 11(Suppl 5): 104, 2018 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-30454048

RESUMO

BACKGROUND: Breast cancer is the most common type of invasive cancer in woman. It accounts for approximately 18% of all cancer deaths worldwide. It is well known that somatic mutation plays an essential role in cancer development. Hence, we propose that a prognostic prediction model that integrates somatic mutations with gene expression can improve survival prediction for cancer patients and also be able to reveal the genetic mutations associated with survival. METHOD: Differential expression analysis was used to identify breast cancer related genes. Genetic algorithm (GA) and univariate Cox regression analysis were applied to filter out survival related genes. DAVID was used for enrichment analysis on somatic mutated gene set. The performance of survival predictors were assessed by Cox regression model and concordance index(C-index). RESULTS: We investigated the genome-wide gene expression profile and somatic mutations of 1091 breast invasive carcinoma cases from The Cancer Genome Atlas (TCGA). We identified 118 genes with high hazard ratios as breast cancer survival risk gene candidates (log rank p <  0.0001 and c-index = 0.636). Multiple breast cancer survival related genes were found in this gene set, including FOXR2, FOXD1, MTNR1B and SDC1. Further genetic algorithm (GA) revealed an optimal gene set consisted of 88 genes with higher c-index (log rank p <  0.0001 and c-index = 0.656). We validated this gene set on an independent breast cancer data set and achieved a similar performance (log rank p <  0.0001 and c-index = 0.614). Moreover, we revealed 25 functional annotations, 15 gene ontology terms and 14 pathways that were significantly enriched in the genes that showed distinct mutation patterns in the different survival risk groups. These functional gene sets were used as new features for the survival prediction model. In particular, our results suggested that the Fanconi anemia pathway had an important role in breast cancer prognosis. CONCLUSIONS: Our study indicated that the expression levels of the gene signatures remain the effective indicators for breast cancer survival prediction. Combining the gene expression information with other types of features derived from somatic mutations can further improve the performance of survival prediction. The pathways that were associated with survival risk suggested by our study can be further investigated for improving cancer patient survival.


Assuntos
Algoritmos , Neoplasias da Mama/genética , Neoplasias da Mama/mortalidade , Neoplasias da Mama/patologia , Feminino , Fatores de Transcrição Forkhead/genética , Regulação Neoplásica da Expressão Gênica , Genoma Humano , Humanos , Mutação , Modelos de Riscos Proporcionais , Receptor MT2 de Melatonina/genética , Análise de Sobrevida , Transcriptoma
14.
Sci Rep ; 8(1): 8995, 2018 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-29875368

RESUMO

A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has not been fixed in the paper.

15.
BMC Bioinformatics ; 19(1): 181, 2018 05 24.
Artigo em Inglês | MEDLINE | ID: mdl-29793423

RESUMO

After publication of the original article [1], it was noticed that the Acknowledgement statement was incorrect. The original statement reads.

16.
17.
Sci Rep ; 8(1): 267, 2018 01 10.
Artigo em Inglês | MEDLINE | ID: mdl-29321535

RESUMO

The war on cancer is progressing globally but slowly as researchers around the world continue to seek and discover more innovative and effective ways of curing this catastrophic disease. Organizing biological information, representing it, and making it accessible, or biocuration, is an important aspect of biomedical research and discovery. However, because maintaining sophisticated biocuration is highly resource dependent, it continues to lag behind the continually being generated biomedical data. Another critical aspect of cancer research, pathway analysis, has proven to be an efficient method for gaining insight into the underlying biology associated with cancer. We propose a deep-learning-based model, Stacked Denoising Autoencoder Multi-Label Learning (SdaMLL), for facilitating gene multi-function discovery and pathway completion. SdaMLL can capture intermediate representations robust to partial corruption of the input pattern and generate low-dimensional codes superior to conditional dimension reduction tools. Experimental results indicate that SdaMLL outperforms existing classical multi-label algorithms. Moreover, we found some gene functions, such as Fused in Sarcoma (FUS, which may be part of transcriptional misregulation in cancer) and p27 (which we expect will become a member viral carcinogenesis), that can be used to complete the related pathways. We provide a visual tool ( https://www.keaml.cn/gpvisual ) to view the new gene functions in cancer pathways.


Assuntos
Biologia Computacional/métodos , Estudos de Associação Genética , Predisposição Genética para Doença , Aprendizado de Máquina , Anotação de Sequência Molecular , Neoplasias/genética , Algoritmos , Bases de Dados Genéticas , Estudos de Associação Genética/métodos , Humanos , Neoplasias/metabolismo , Neoplasias/patologia
18.
Genes (Basel) ; 9(1)2018 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-29303984

RESUMO

Lung cancer is the second most commonly diagnosed carcinoma and is the leading cause of cancer death. Although significant progress has been made towards its understanding and treatment, unraveling the complexities of lung cancer is still hampered by a lack of comprehensive knowledge on the mechanisms underlying the disease. High-throughput and multidimensional genomic data have shed new light on cancer biology. In this study, we developed a network-based approach integrating somatic mutations, the transcriptome, DNA methylation, and protein-DNA interactions to reveal the key regulators in lung adenocarcinoma (LUAD). By combining Bayesian network analysis with tissue-specific transcription factor (TF) and targeted gene interactions, we inferred 15 disease-related core regulatory networks in co-expression gene modules associated with LUAD. Through target gene set enrichment analysis, we identified a set of key TFs, including known cancer genes that potentially regulate the disease networks. These TFs were significantly enriched in multiple cancer-related pathways. Specifically, our results suggest that hepatitis viruses may contribute to lung carcinogenesis, highlighting the need for further investigations into the roles that viruses play in treating lung cancer. Additionally, 13 putative regulatory long non-coding RNAs (lncRNAs), including three that are known to be associated with lung cancer, and nine novel lncRNAs were revealed by our study. These lncRNAs and their target genes exhibited high interaction potentials and demonstrated significant expression correlations between normal lung and LUAD tissues. We further extended our study to include 16 solid-tissue tumor types and determined that the majority of these lncRNAs have putative regulatory roles in multiple cancers, with a few showing lung-cancer specific regulations. Our study provides a comprehensive investigation of transcription factor and lncRNA regulation in the context of LUAD regulatory networks and yields new insights into the regulatory mechanisms underlying LUAD. The novel key regulatory elements discovered by our research offer new targets for rational drug design and accompanying therapeutic strategies.

19.
Comput Struct Biotechnol J ; 15: 463-470, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29158875

RESUMO

Clear cell renal cell carcinoma (ccRCC) is the most common and most aggressive form of renal cell cancer (RCC). The incidence of RCC has increased steadily in recent years. The pathogenesis of renal cell cancer remains poorly understood. Many of the tumor suppressor genes, oncogenes, and dysregulated pathways in ccRCC need to be revealed for improvement of the overall clinical outlook of the disease. Here, we developed a systems biology approach to prioritize the somatic mutated genes that lead to dysregulation of pathways in ccRCC. The method integrated multi-layer information to infer causative mutations and disease genes. First, we identified differential gene modules in ccRCC by coupling transcriptome and protein-protein interactions. Each of these modules consisted of interacting genes that were involved in similar biological processes and their combined expression alterations were significantly associated with disease type. Then, subsequent gene module-based eQTL analysis revealed somatic mutated genes that had driven the expression alterations of differential gene modules. Our study yielded a list of candidate disease genes, including several known ccRCC causative genes such as BAP1 and PBRM1, as well as novel genes such as NOD2, RRM1, CSRNP1, SLC4A2, TTLL1 and CNTN1. The differential gene modules and their driver genes revealed by our study provided a new perspective for understanding the molecular mechanisms underlying the disease. Moreover, we validated the results in independent ccRCC patient datasets. Our study provided a new method for prioritizing disease genes and pathways.

20.
BMC Bioinformatics ; 18(Suppl 14): 489, 2017 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-29297275

RESUMO

BACKGROUND: Long noncoding RNAs (lncRNAs) are involved in diverse biological processes and play an essential role in various human diseases. The number of lncRNAs identified has increased rapidly in recent years owing to RNA sequencing (RNA-Seq) technology. However, presently, most lncRNAs are not well characterized, and their regulatory mechanisms remain elusive. Many lncRNAs show poor evolutionary conservation. Thus, the lncRNAs that are conserved across species can provide insight into their critical functional roles. RESULTS: Here, we performed an orthologous analysis of lncRNAs in human and rat brain tissues. Over two billion RNA-Seq reads generated from 80 human and 66 rat brain tissue samples were analyzed. Our analysis revealed a total of 351 conserved human lncRNAs corresponding to 646 rat lncRNAs. Among these human lncRNAs, 140 were newly identified by our study, and 246 were present in known lncRNA databases; however, the majority of the lncRNAs that have been identified are not yet functionally annotated. We constructed co-expression networks based on the expression profiles of conserved human lncRNAs and protein-coding genes, and produced 79 co-expression modules. Gene ontology (GO) analysis of the co-expression modules suggested that the conserved lncRNAs were involved in various functions such as brain development (P-value = 1.12E-2), nervous system development (P-value = 1.26E-3), and cerebral cortex development (P-value = 1.31E-2). We further predicted the interactions between lncRNAs and protein-coding genes to better understand the regulatory mechanisms of lncRNAs. Moreover, we investigated the expression patterns of the conserved lncRNAs at different time points during rat brain growth. We found that the expression levels of three out of four such lncRNA genes continuously increased from week 2 to week 104, which is consistent with our functional annotation. CONCLUSION: Our orthologous analysis of lncRNAs in human and rat brain tissues revealed a set of conserved lncRNAs. Further expression analysis provided the functional annotation of these lncRNAs in humans and rats. Our results offer new targets for developing better experimental designs to investigate regulatory molecular mechanisms of lncRNAs and the roles lncRNAs play in brain development. Additionally, our method could be generalized to study and characterize lncRNAs conserved in other species and tissue types.


Assuntos
Encéfalo/metabolismo , Sequência Conservada/genética , RNA Longo não Codificante/genética , Animais , Perfilação da Expressão Gênica , Ontologia Genética , Redes Reguladoras de Genes , Humanos , Anotação de Sequência Molecular , Fases de Leitura Aberta/genética , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Ratos , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...